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Abstract 

In high frequency financial data not only returns but also waiting times between 
trades are random variables. In this work, we analyze the spectra of the waiting-time 
processes for tick-by-tick trades. The numerical problem, strictly related with the 
real inversion of Laplace transforms, is analyzed by using Tikhonov's regularization 
method. We also analyze these spectra by a rough method using a comb of Dirac's 
delta functions. 

Key words: Econophysics; Exponential distribution; Inverse problems 
PACS: 89.65.Gh; 89.65.Gh; 02.30.Zz 



1 Introduction 



It has been previously shown that waiting times between orders as well as 
trades do not follow an exponential distribution [1,2,3]. This phenomenon can 
be explained by variable activity during the trading day, leading to a suitable 
mixture of exponential distributions in order to describe the distribution of 
durations [4]. In this paper, we study the activity spectrum, by numerically 
inverting the empirical survival function. The paper is organized as follows: In 
section 2, we give the basic theoretical background and we present two methods 
to derive the activity spectrum. In section 3, the methods are applied to real 
financial data. Finally, section 4 contains our conclusions. 
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2 Theory 



In tick-by-tick financial data, the waiting time (duration), r, between two con- 
secutive trades is a random variable. Let us call ip{r) the probability density of 
durations. If we suppose that the duration process is a mixture of exponential 
processes, we can write: 

[OO 

V(r)= / g{X)\e-^d\, (1) 
Jo 

where g(X) is the spectrum of activity satisfying 

/ g{X)d\ = l. (2) 
Jo 

A similar equation can be written for the survival function (the complementary 
cumulative distribution function) \I/(r) = 1 — J T ip(r') dr': 

*(r) = / g{X)e~ Xr dX . (3) 
Jo 



From eq. (3), the activity spectrum can be seen as the solution of a Fredholm 
problem of the first kind: 

*(t)= / g(X)K(X,r)dX, (4) 
Jo 

with the kernel K equal to 

K(X,T)=e- XT . (5) 



^(r) is indeed easily accessible from empirical data; however the problem 
becomes the real inversion of a Laplace transform for discrete and noisy real 
data [5]. 

We can rewrite our linear problem, defining a matrix 

K = {k lj } k ii = e- hii , (6) 

for i,j = 1, ...,T max , where T max is the largest waiting time; the value of the 
/t-parameter has to be chosen with the aim of covering all the spectrum. In 
fact, we can think of the index j as the waiting time, whereas Aj = hi are 
values of A in which we want to determine the unknown function g(X). The 
equation 3 then becomes a matrix equation: 

* = Kg, (7) 
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where \E r is the vector of ^ values and g is the unknown vector of activities 
g(\i). The problem is ill-conditioned, in fact the ratio between the maximum 
and minimum elements in the matrix K is equal to e h<yTmax ~ lS) and the ratio 
between the maximum and minimum empirical values of ^ is equal to the 
number of duration data points. For the data set described below, r max = 196s 
and the number of durations is 55559. 



2. 1 Tikhonov 's Method 



Several techniques are available in applied mathematics to solve ill-conditioned 
linear systems. One of the most powerful and commonly used is Tikhonov's 
regularization method [6,7,8]. 

We can think of the solution of a linear system as the minimum of the func- 
tional 

L[g] = ||^g-*|| 2 , (8) 

and the key idea of the method is to introduce a regularization parameter, a 
positive real number /i, and a regularization matrix (often the identity matrix 
/) such that the functional becomes 

L M [g] = ||^g M -*|| 2 + /U ||g^| 2 /i>0. (9) 

Some theorems are available for error estimation [6,7,8]. 

The procedure to find the minimum of I/ M [x] can be reduced to the problem 
of determining an inverse matrix for an optimal value of fi. In fact, one can 
show that 

g^i^K + fiiy 1 ^^, (io) 

where K T is the transpose of K. In order to find the optimal value of [i it is 
usually possible to apply the Generalized Cross Validation technique [9] or the 
L-curve method [10], that are less subjected to ill-conditioning of the matrix, 
but in our case, the matrix is too ill-conditioned even for these methods. To 
circumvent this difficulty, we have used Tikhonov's method for a large number 
of different /i values and we have compared the rebuilt survival function 

*» = Kg, (11) 

with the empirical one by means of the Kolmogorov-Smirnov test. The best 
result will be the best fit of the empirical data. 
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2.2 The method of Dirac's delta comb 

In this case, we assume the spectrum to be a comb of Dirac's delta functions: 

M 

</(A) = 5>*(A - A,) , (12) 

i=i 

where M is a suitable number of time intervals of constant activity, A,, in 
which the trading period has been divided and a, are suitable weights such 
that Ylt\ Oi = 1. As a conseguence, the survival function becomes: 

M 

$M = ^«,e- V . (13) 
i=i 

We use the following procedure to estimate the parameters a, and \. Let us fix 
a time-window, AT, and let us consider the minimum number Nj of waiting 
times for which the sum 

Tj = J2 T i ( 14 ) 

1=1 

is larger than AT. Then a term is added in eq. (12) with the following param- 
eters: 

\ j = N j /T j a 3 = N 3 /N. (15) 

where N is the total number of data. The new interval starts when t Nj occurs. 
In this way the normalization 

M 

i=i 

arises naturally. With this method, the value M is unknown at the beginning. 
Again, we test the rebuilt survival function with the Kolmogorov-Smirnov test 
for different values of AT in order to find the optimal size. We used this simple 
method also to estimate the parameter h in the matrix (6). 



3 Results 

In order to calibrate the methods, we have first applied them to synthetic data 
sets extracted from and exponential or a Mittag-Leffler distribution [11,12]. 
Here, however, we only report results on activity spectrum estimates for real 
market data. 

As in ref.[2], we have considered NYSE General Electric tick-by-tick data of 
October 1999. After filtering the data, 55559 waiting times were recorded, 
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Fig. 1. Survival function for GE OCT 99 data. The solid line represent an exponen- 
tial fit with A = — 
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Fig. 2. General Electric data analysis. KS probability as a function of the time 
interval. 

with a mean r ~ 8.85s. The empirical survival function is shown in figure 
Fig.l. 

The matrix in eq. (6) has a free parameter: h. It defines the range of A in which 
we can evaluate the function g(X). In our case, we fix h to be 0.0015 based on 
a preliminary analysis using the Dirac's delta comb method. The size of the 
matrix is r max x r max = 196 x 196. Therefore, our spectrum ranges from to 
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Fig. 3. Dotted line: empirical survival function. Solid line: survival function built by 
means of the time-splitting method with AT = 1500s. 
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Fig. 4. KS probability vs fi for the Tikhonov method. We can see that the optimal 
value of \x is around 0.654 

~ 2.5 times A = 1/t . 

As for the time-split method, we found that the optimal value for AT is around 
1500s (see Fig. 2). In Fig. 3, we present the rebuilt survival function. 

As for Tikhonov's method, in Fig. 4, we present the goodness of fit based on 
the KS test as a function of the parameter /i. Using this criterion, the optimal 
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Fig. 5. Dotted line: empirical survival function. Solid line: survival function rebuilt 
using the Tikhonov method with \i = 0.654. 

value is /i ~ 0.654. In Fig. 5, we can see the optimal reconstruction of the 
empirical survival function. 



4 Summary and conclusions 

By assuming that the survival function of intertrade durations can be written 
as a mixture of exponential distributions, we have proposed two methods to 
reconstruct the activity spectrum. The first method is based on Tikhonov's 
regularization. The second method uses an ansatz of intervals of constant 
activity. In this paper, we have not given any rigorous convergence proof and 
the methods outlined above were just heuristic. 

The code used for this paper is available from [13] or it can be obtained 
from the authors. Unfortunately, for copyright reasons, we cannot publish the 
market data-set, but we can provide a full synthetic data-set based on the 
Mittag-Lefner function. More details about the methods will be available in a 
forthcoming paper as well as in the PhD thesis by Mauro Politi. 
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